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"An equilibrium does not appear be- 
cause agents are rational, but rather 
agents appear to be rational because an 
eqmUbrium has been reached. [. . .] The 
task for game theory is to formulate a 
notion of rationality. " 

Larry Samuelson 120', p. 3] 



Abstract. Game theoretic equilibria are mathematical expressions of 
rationality. Rational agents are used to model not only humans and their 
software representatives, but also organisms, populations, species and 
genes, interacting with each other and with the environment. Rational 
behaviors are achieved not only through conscious reasoning, but also 
through spontaneous stabilization at equilibrium points. 
Formal theories of rationality are usually guided by informal intuitions, 
which are acquired by observing some concrete economic, biological, or 
network processes. Treating such processes as instances of computation, 
we reconstruct and refine some basic notions of equilibrium and ratio- 
nality from the some basic structures of computation. 
It is, of course, well known that equilibria arise as fixed points; the point 
is that semantics of computation of fixed points seems to be providing 
novel methods, algebraic and coalgebraic, for reasoning about them. 



1 Introduction 

Game theory studies distributed processes where the resources are shared among 
the agents with different, inconsistent, and often adversarial goals. Originally 
devised as a tool of economics, politics, and warfare, game theory recently be- 
came an indispensable tool of algorithmics, especially as the processes and the 
problems of computation spread from computers to networks. The other way 
around, the algorithmic aspects of game theory have attracted a lot of attention 
on their own, leading to fruitful interactions between economics and algorithmics 
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* Supported by ONR and EPSRC. 



In semantics of computation, often viewed as the stylistic dual of algorith- 
mics, the paradigm of game also played a crucial role, and led to the solutions of 
some deep and long standing problems [llllj . yet the resulting toolkit of game 
semantics [2] remained largely disjoint from the game theoretic methods, and 
concerns. While this may very well be justified by the different, and perhaps 
even disjoint goals of game semantics and game theory, the growing importance 
of the computational aspects of game theory continues to spur the explorations 
of a different conceptual link: // gaming is computation, which semantical and 
programming methodologies apply to it? 

The present paper provides a belated account of some initial explorations in 
this direction, going back to a joint project with Samson Abramsky. The up- 
shot is that the basic models of computation readily extend to capture the basic 
notions of game theory: the tools for reasoning about choice, be it possibilis- 
tic or probabilistic, and the tools to compute fixed points of possibilistic and 
probabilistic processes, turn out to be readily applicable to designing and pro- 
gramming strategic behavior. The approach seems promising in both directions: 
on one hand, the semantical view of games provides a convenient formal frame- 
work for conceptual analyses and concrete computations of response profiles and 
equilibria; the other way around, the game theoretic view of the computational 
processes opens an alley towards modeling a wide range of network interactions 
of increasing practical interest. 

As a running example, we use what may be the smallest and the deepest 
problem of game theory: Prisoners' Dilemma. In its standard solution, traditional 
game theory recommends selfishness as the only rational strategy here, although 
"staying the course" of this strategy leads to an ostensive loss for everyone. Can 
our semantical tools dispel the irrationality of this standard solution, and pro- 
vide a better model of rationality? We propose and analyze several refinements 
of the basic model of strategic reasoning, and show how the implementations of 
the optimization task of gaming can be refined, and their rationality improved. 
Some familiar semantical tools turn out to allow computing more informative 
equilibria, e.g. where players' preferences are quantified, rather than just par- 
tially ordered, and where the payoffs can be used dynamically (e.g. reinvested, or 
discounted), and not just accrued. This seems to suggest that applying seman- 
tical methodologies to program strategies may offer some new solutions, besides 
being fun. 

Outline of the paper. Section [5] sketches a bird's eye view of program and 
process semantics, and points to the place of games in that landscape. In sec- 
tion [3l we reconstruct the familiar notions of Nash equilibrium and evolutionary 
stable strategy, as they could be obtained by running relational (nondetermin- 
istic) programs. We also discuss some nonstandard equilibrium concepts, which 
can be easily designed and implemented in this framework. In sectional we lift 
these equilibrium concepts from the relational to a stochastic framework, where 
they can be obtained as stationary distributions of Markov chains. In order to 
remain close to the usual game-theoretic models, in both these sections games 



are viewed as stateless processes. In section [SJ we discuss the role of state, i.e. 
position, in semantics of gaming. Section [5] summarizes the paper. 

2 Program and process semantics of games 

Semantics of a natural language evolves through speech and through use of 
the language. Semantics of a programming language requires moreover a design 
effort, because it concerns not only communication between people, but also 
programming computers, and they need to be designed before they are built. 
However, as the notion of a computer is changing from a machine in a box to a 
world wide network, the simple notion of a program diversifies. Some programs 
acquire strategic, i.e. game theoretic aspects. We sketch a way to capture these 
aspects in a well studied framework of fixed point semantics, where coalgebras 
are always present in one way or another. 

2.1 Program semantics 

/ 

In categorical semantics, a program is denoted by an arrow A — ^ B in a cat- 
egory D, where the objects A and B denote some data types, of the inputs and 
of the outputs of /, respectively. It is assumed that the category D has cartesian 

products , so that we can also represent a program A x C — ^ B x D x X with 
multiple inputs and multiple outputs. 

But running a program does not just map data to data; it also causes a whole 
range of other observable effects. E.g., a computation may not terminate, or it 
may terminate with several possible outputs for the same input; or it may change 
the state of the computer, or of another resource. Such computational effects can 
be captured by computational monads [13]. Originally proposed as a tool of se- 
mantics, monads have been widely endorsed as a convenient programming tool 
[23] . In the meantime, an alternative presentation of essentially equivalent se- 
mantical structure has been proposed, in terms of premonoidal categories [19". 
The category D of data types, with cartesian products and simple deterministic 
maps between them, is extended to a category C with the same data types as ob- 
jects, but with the computations with nontrivial effects as its morphisms. Along 
the inclusion D — ^ C, the cartesian products of D are mapped into the pre- 
monoidal tensor products in C. The relevant semantical structure is sometimes 
called Freyd category [18]. 

While the models of games that involve some of the well-studied computa- 
tional effects seem quite interesting for future research, in the present paper we 
only consider the simplest effects of the choice operations, and in particular of 
the possibilistic (relational) and probabilisitic (randomized) choice. The reason 
is that these choice operations already come about in game theory, so that we 
can display some familiar ideas from a slightly different angle. 

From the rich tool chest of program semantics, we shall thus consider only 
two simple but fundamental categories of computations: 



— FRel of finite sets and relations, and 

— SRel of finite sets and stochastic relations. 



A morphism in either of these categories will be denoted by a crossed arrow 

R 

— . While a binary relation A — B in FRel can be viewed as a matrix 

R P 

B X A — ^ {Oilji a stochastic relation A—^^ B in SRel is a matrix B x 

A — ^ [0, 1], i.e. P = {pji)BxA, where Pj^ e [0, 1] and YjjeBPn = 1 ^o\ds. for 
all i & A. Intuitively, the entry can thus be viewed as the probability that 
the input i E A will result in the output j G B. The composition in SRel is the 
matrix composition. Both categories of computations C = FRel, SRel have the 
same cartesian subcategory of deterministic maps B = Set. They both happen 
to be monoidal, rather than premonoidal. 



2.2 Processes and controls 

We model processes simply as programs that depend on a state, and may change 
it. If the state space is represented by an object X in a category of computations 

C, then a process is thus a morphism A y. X —6- B x X . Since every category 
of computations inherits along the inclusion D ^ — C the cartesian diagonals 

A Ax A and the projections Ax B — ^ A and Ax B B , we can 
separate the data part and the state part of a process as 

- Rb : Ax X — Bx X B and 

- Rx ■■ Ax X — f-^ BxX X ■ 

A process can be ongoing, and its outputs may be used to determine the inputs 
to be fed back into it. This is expressed through feedback BxX -+*- A ■ To 
stabilize a process, a control X — A can be extracted as a fixed point 

Ax X ^ B X X Bxx4^A 



AxX-^BxX-^A 

'y=Fix a(4>oR) 

X \ ^ A 

A X X +r^S X X 

Such fixed point operations play a central role in modeling processes and controls. 
We shall see that they play a central role in modeling games. The fixed point 
operations in FRel and SRel are spelled out in the Appendix. 

For a categorical insider, we add that any Freyd-category |18| with a family 
of Conway fixed point operators |6l21j should suffice for the (as yet putative) 
research in abstract game theory. Equivalently, a traced Freyd category will do 
as well [4]. 



Examples. The simplest example of a process is a Mealy machine. A deter- 
ministic one is simply a function Ax X — B x X . A nondeterministic (pos- 

sibilistic) one is a relation Ax X — B x X . K probabilistic automaton can in 
principle be viewed as a stochastic relation of the same type, just in SRel rather 
than in FRel. Other examples of processes include Markov chains. ..and even 
games. 



2.3 Games as processes 

An m-player game is a process Ax X -4^ B x X where the inputs, the outputs 
and the state consist of m components, i.e. 

A=l[Ai B=l[Bi X=l[Xi 

i^m i^m i^m 

where we represent ordinals following von Neumann, in the form m = {0,1, ... ,m- 
1}. The inputs Ai are thought of as the moves available to the i-th player; the out- 
puts Bi are her payoffs; the states in Xi are the positions that she can observe. 
The payoff types Bi are usually ordered, and this ordering expresses player's 
preference. 

Games can thus be viewed as a special case of controllable processes, de- 
scribed in the preceding section. The optimization task of control is, however, 
slightly different. First of all, it is distributed: instead of a global control, each 
player designs and implements an individual strategy. And secondly, these strate- 
gies are not designed using feedback, to respond to the outputs, but rather to 
respond to the inputs supplied by the other players: 



Ax X -i^ B X X 



A_i X Xi — Ai 



A X X ^ \ — ^ A 



RS'=FixA{RS) 

H >:A 




At step (★) , each player i implements her rationality through a response relation 
RSi to all of the opponents' moves, which arc chosen from 

A-i = Yl Ak 

At step (★★), the individual response relations RSi are gathered into the response 
profile RS, which is simply the m-tuple of the relations RSi. 



Finally, at step (J>), the equilibrium RS' is computed, as the fixed point of 

the response profile RS. The equilibrium is an m-tuple of relations X — Ai . 
It tells, for each position from X, how will the game be played if each player 
responds by RSi to everyone else's responses RS-i = {RSk o T^k) kGm\{i} ■ The 
equilibrium is thus the global (social) result of the local (individual) preferences 
and of distributed reasoning (programming) in pursuit of these preferences. 

How does an equilibrium come about? The usual explanation is that each 
player i knows everyone's preferences, and can thus construct RSk, for all fc G m, 
on her own, and thus compute the profile RS and the equilibrium RS* = 
Fix{RS). But this explanation should be taken as a metaphor. In reality, equilib- 
ria are often reached e.g. in biological systems, and in other genuinely distributed 
processes, where the agents do not perform explicit local computations, or reason 
about each other. Moreover, even in the cases where all strategies and their fixed 
points could conceivably be computed at each node, the fact that there are usu- 
ally many equilibria gives rise to the question how do the players coordinate to 
meet at one equilibrium. This is where game theory enters the conceptual realm 
of "invisible hand" and equilibrium selection [20j . Modeling such genuinely dis- 
tributed processes is one of the most interesting challenges of the computational 
semantics of gaming. 

In the rest of the paper, we explore more closely each of the steps in the 
above derivation of the response strategies and equilibria. We begin with step 
(★), where each player "programs" her response to other players' possible moves. 

3 Strategies as nondeterministic programs 

To reconstruct the first concepts of standard game theory, we first consider one 
shot games, i.e. where X = 1. This is standard in traditional game theory, 
where it is assumed that each player chooses a strategy in advance, and plays it 
out no matter what. The notion of position, or state, is thus abstracted away. 
Another standard assumption is that the payoffs are uniquely determined for 
each player, by mapping each tuple of moves in A to a tuple of payoffs in B. 

The game thus boils down to a function A — ^ B ■ If there are 2 players, called 
and 1, and if their payoffs are in Bq = Bi — R, this gives the usual bimatrix 

form Aq X Ai — ^ M x M . If there are 3 players, the game can be viewed as 3 
tri-matrices, etc. 

Even if the game, represented by the functions that compute the payoffs, is 
completely deterministic, the fact that the players need to choose between the 
various possible moves makes their strategies into nondeterministic programs. 
In this section, we view them as relations; later we view them as stochastic 
relations. 

3.1 Designing and refining relational strategies 

We assume that the payoff types Bi are ordered, and that the players prefer 
higher payoffs. Each of them thus programs a response strategy towards the 



goal of maximizing his payoffs. We first consider some simple implementations, 
and then show how they can be refined. 

A. Best response simply maximizes the payoff 

S-i BRi Si 'i=> Vti e Ai. gi{ti,s^i) < gi{si,s^i) 

B. Stable response refines the view by taking the possible opponents' re- 
sponses into account: 

s_j SR^ yti G Ai. Qi{ti,s-i) < gt{si,s-i) A 

{Qi{ti,s-i) = £>i(si,s_j) 

Vt_i G A_i. Qi{ti,t-i) < Qi{Si,t-i)) 

The idea is that a stable response Si to s_i should remain optimal in some 
neighborhood of s-i. When the payoff function Qi is linear, it is easy to prove 
that the above definition captures this. Indeed, if the opponents deviate from 
S-i and play (1 — e)s_j + et_i for a small £ > 0, then a stable response Sj will 
still be the best, because the validity of 

(1 - e)Qi{Si, S-i) + £Qi{Si, t-i) > (1 - £)Qi{ti, S-i) + £Qi{ti, t-i) 

for all ti, follows from the above definition of SRi. 

C. Uniform response goes a step further by taking the opponents' best re- 
sponse into account: 

S-i URi Si -s=> BRi Si A 

Vt-i G A-i. Si BR-i t-i t-i BRi Si 

where Si BR-i t-i abbreviates yk G m. k i ^ {si,t^i^k) BRk tk- The best 
response Sj is thus required to remain optimal not only with respect to S-i, but 
also with respect to the opponents' best responses to the profiles that include 
Si. This is a rational, but very strong requirement: the relation URi may be 
empty. We mention it as a first attempt to refine the response by anticipating 
opponents' responses to it. The next example proceeds in this direction, while 
assuring a nonempty set of responses. 

D. Constructive response. While the uniform response captures the; best 
responses to all of the opponent's responses, the constructive response relation 
also capture the responses that may not be the best responses to a fixed oppo- 
nent's move, but are better than what the best response would turn into in the 
context of opponent's rational responses to it. 

S-i CRi Si <s=^ G Ai. Qi{si,s-i) < Qi{ti,s-i) ^ 

3t-i G A-i. Q-i{ti,t-i) > Q-i{ti,S-i) A 

Qi{ti,t-i) < Qi{Si,S-i) 



Example. Prisoners' Dilemma is a famous 2-player game, usually presented 
with a single state X — 1 and two moves, "cooperate" and "defect", thus Aq = 
A\ = {c, d}, and Bq = B\ = K. Players' preferences are given by a payoff 
function q — {go, gi) : {c, d} x {c, d} — ^ M x M which can be presented by the 
bimatrix 



10 

10 


11 






11 


1 

1 



telling that go{c,c) — gi — 10, po(c, d) = 0, gi(c, d) — 11 etc. The point is 
that players' local reasoning leads to globally suboptimal outcome: each player 
seems forced to play d, because he wins more whether the opponent plays c or 
d; but if both players play d they both win less than if they both play c. The 
constructive response allows the players to keep the strategy c as a candidate 
solution. Although go{c,c) gives lower payoff than £io(d, c), the player knows 
that the profile (d, c) is unlikely to happen, because gi (d, d) > gi (d, c) . So he 
keeps (c, c) as better than (d, d) . 

Of course, this form of rationality does not offer the worst case protection, 
and may not seem rational at all, because there is no guarantee that (c, c) will 
happen either. Indeed, gi(c, d) > pi(c, c) is likely to motivate player 1 to defect, 
which leads to the worst outcome for player 0, since £Po(c, d) < gio(c, c). 

However, if player 1 follows the rationality of CR, and not BR, then he'll 
also consider cooperating, because of the threat that player would retaliate in 
response to his defection, and gi(d, d) < gi(c, c). So the possibility of the solution 
(c, c) depends on whether the players share the same rationality CR. This sharing 
cannot be coordinated in the relational model of one-round Prisoners' Dilemma. 
We shall later see how some more precise models do allow this. 

3.2 Playing out the strategies: computing the equilibria 

For each of the described notions of response, we now consider the corresponding 
notion of equilibrium, derived at step (J) in section [2?3l The relational fixed point 
operators are described in the Appendix. 

A. Rationalizability and the Nash equilibrium. The Nash best response 
relations yield the system of n relations BR, which we write as 

s BR t Vi £ n. s^i BRi ti 

It is not hard to see that the strong fixed point (see appendix) yields the solutions 
of that system, i.e. 



BR's 4=» s BR s 

■<=^ \/i e n. S-i BRi Si 



This, of course, means that s is a Nash equilibrium [14] . On the other hand, the 
weak fixed point extracts the transitive closure of Bi?, i.e. the smallest relation 
BR* satisfying 

BR*s ^ 3t. BR*tMBRs 

In game theory, the strategies {si \ BR*Si} are said to be rationalizable [5117] . 
An Si is rationalizable if and only if there is a rationalizable counterstrategy t-i 
for which s,; is the best response. 

B. Stable Strategies. Given 

s SR t <^=> Vi G n. s-i SRi U 

the fixed point 

SR's <^===> Vi G n. s_i SRi Si 

Vt G A.yi G n. gi{ti,s^i) < gi{si,S-i) A 

Vt_., G A^i. gi{ti,t^^) < gi{s^,t^i) 

is an evolutionary stable strategy, which is a straightforward generalization of 
the concept due to biologist John Maynard-Smith [12pl . On the other hand, the 
weak fixed point 

SR*s ^ 3t.SR*tAtSRs 

yields the new class of stably rationalizable strategies. Unfolding the above equiv- 
alence tells that s is stably rationalizable iff every Si is the best response for some 
stably rationalizable and moreover, whenever ti is another best response to 
t-i, as good as Si, then Si is at least as good as ti with respect to the other 
counterstr ategies . 

C. Uniform equilibria and profiles. 

UR' {si,s-i) 4=^ Vi G n. s-i URi Si 

Vi G n. s_,j BR, s, A 

Vt_i G Si BR^i ^ t^i BR, s, 

where BR^i is like in l3.1C . A uniform equilibrium s is thus a Nash equilibrium 
such that each its components Si is a uniform move, in the sense that it lies in 
the set 

U, = {s, G A^\ Vt_, G A^,. s, BR^, t^, t^,BR,Si} 



^ He considered the symmetric case, where all players have the same preferences and 
the same choice of actions. 



A Nash equilibrium thus fails to be uniform whenever some opponent has an 
alternative best response. The uniformity of a response i-th player assures that 
it is the best response also with respect to such alternatives. In a sense, the 
uniformity requirement only eliminates the unreliable Nash equilibria from the 
search space. 

The weak fixed point 



yields the new class of uniformly rationalizable strategies. Unfolding the above 
equivalence tells that s is uniformly rationalizable iff every Si is a uniform best 
response for some uniformly rationalizable t-i. 

D. Constructive equilibrium. 



As it stands, this equilibrium includes the Nash equilibria, and the fixed points of 
CR, chosen because they yield better payoff than the equilibria. While CR itself 
does not guarantee the feasilibity of any Ci?i-response of a particular player, the 
Ci?-equilibrium does guarantee that all players have the same Ci?-justification. 

Remark. The above characterizations of equilibria guarantee provide no ex- 
istence guarantees: e.g., the set BR' of the Nash equilibria, of course, always 
exists, but it can be empty. The existence, of course, requires additional side 
conditions, such as the convexity of the set of strategies [M]. 

Example. For Prisoners' Dilemma, both (c, c) and {d, d) are constructive equi- 
libria. The former is unstable, since each player can improve her immediate 
payoff by defecting. This gain can be offset by the loss from retaliation, and can 
be irrational, especially if the value of (c, c) is much larger than the value of 



But the relational view of the strategic choices cannot express these quan- 
titative considerations. In the next section, we explore a refinement where they 
can be expressed. 

4 Strategies as randomized programs 

In this section, we consider the framework where the preferences are quantified: 
the strategic choices are expressed as probability distributions over the available 
moves. A strategy is thus a randomized programl^ — Is it possible to improve 
the rationality of strategic behaviors by quantifying the preferences, and biasing 
them more towards the more favorable moves? 

^ The payoff functions can also be viewed as randomized programs, capturing games 
that involve some form of gambling. But this leads to an essentially different type of 
game theory [9j. 





{d,d). 



In the standard game theoretic reasoning, the payoffs are only used as a 
convenient way to express players' preference ordering. Indeed, any affine trans- 
formation of a payoff matrix represents the same game. In the present section, 
this is not the case any more. We assume that all payoffs are non-negative, and 
normalize them into probability distributions. 

A. Best response distribution is just a normalization of the payoff function: 

s BD s = ^^^^^■•'^-i) 

where Ai x A-i [0, 1] is viewed as a fuzzy relation A-i Ai . The idea is 
that s_i BDi Si (which can be viewed as the matrix entry in the row Sj and the 
column s-i) records not only that Si is the best response to s_i, like BRi did, 
i.e. not just that Sj is preferred to the other responses ti; but s_i BDi Si also 
quantifies how much better is Si than ti, in terms of the difference (s_i BDi Si) — 
{s-i BDi ti). 

B. Stable response distribution measures not only how good is Sj as a 
response to s_i, but also how good is it, on the average, with respect to the 
other countermoves t-i. 

Qi{si, S-i) ■ gi{si,t_i) 

S — i bDi Si 



^2tieAi Qi^^i, S—i) ■ ^2t-,eA-, Qii^ij t — i) 

Like in the case of the relational stable response, if Sj and ti yield are equally 
good as responses to s_,, then Sj remains a stable response if it is at least as 
good as ti with respect to all other countermoves t^i. Moreover, Si will now 
remain stable even if it is not as good as ti with respect to each other t-i, but 
just if it is as good on the average. In fact, if Si is much better than ti on the 
average, the probability S-i SDi Si may be greater than S-, SDi ti, even if 
gi{si,s-i) < Qi{ti,s-i). 

C. Uniform response distribution multiplies the probability of a response 
Si to S-i by the payoffs from the response Sj to all other countermoves t-i, 
averaged by the likelihood that t-i may occur as the countermove against Sj, 
which is taken to be proportional with Q-i{si, t-i). 

Qi{Si, S_j) ■ Y.i_aA-- Qi{Si,t-i) ■ Q-i{si,t-i) 

S-i UDi Si 



Stie^i Qi{ti,S-i) ■ ^t-iEA-i Qi{ti,t-i) ' B-iiU^t-i) 

If it happens that Si is a good response across all the best countermoves t-i 
against it, then Si is assigned a high uniform response probability. This was 
expressed in the uniform response relation too. The preference is now not only 
quantified, but also smoothened out, as to have a high uniform response prob- 
ability as soon as is a good response to the likely countermoves just on the 
average. 



D. Constructive response distribution To simplify notation for a sequence 
of values {f{s,y))s(zA renormalized into a probability distribution over A, we 
shall henceforth write 



L/(.'^,?/)J.= 



The upshot is that we get J2seA L/(*7 2/)Js — 1- ^he subscript s, denoting the 
renormalized variable, will be omitted when clear from the context. 
We also write 



a if a > 
otherwise 



\a\ + a 



Now define 



CD, s, = 



t-iSA^i 



The idea behind constructive distribution is that the probabilistic weight of Si 
as a response to s_i is now increased to equal the weight of a ti that may be a 
better response to s_i alone, but for which there is a threat of the countermoves 
t-i, which are better for the opponent than s_i, but worse for the player. 



Examples. Response distributions for Prisoners' Dilemma are now 

SD, = 




100 
101 

11 



1011 ^, 




where the columns represent the opponent's moves c and d, the rows the player's 
own responses, and the entries the suggested probability for each response. 



Stochastic equilibria. Stochastic response profiles are Markov chains, and the 
induced equilibria are their stationary distributions. Playing out the randomized 
response strategies and computing stochastic equilibria is thus placed in a rich 
and well ploughed field [IB] . 



A stochastic Nash equihbrium is a uniform fixed point BD* = Fix{BD), 
which can be computed as in Appendix B. Since each player participating in 
the profile BD responds by a mixed strategy where the frequency of a move is 
proportional to the payoff that it yields, the condition BD* = BD o BD' means 
that BD' maximizes everyone 's average payoff. Formally, this is a consequence 
of the fact that BD is a stochastic matrix, and that 1 is its greatest eigenvalue, 
so that the images of the vectors in the eigenspace of 1 are of max;imal length. 

The stochastic equilibria SB', UB' and CB' maximize players' average 
payoffs in a similar manner, albeit for more refined notions of averaging, captured 
by their more refined response distributions. 



Back to the Dilemma. The strategies UD and CD above recommend coop- 
eration as a better response to peer's cooperation. One might thus hope that, by 
taking into account the average payoffs, the stochastic approach may overcome 
the myopic rationality of defection as the equilibrium in Prisoners' Dilemma. 
Unfortunately, it is easy to sec the only fixed point of any response disrribution 

in the form RD — ( . ^ ? I is the vector | ? | , as soon as p < 1. Defection is 

v-py vJ 

the only equilibrium. 

Let us try to understand why. Suppose that it is assured that both players 
play a constructive strategy. If they both assume that the other one will coop- 
erate, each of them will cooperate at the first step with a probability which 
seems favorable. However, under the same assumption, the probability that ei- 
ther of them will cooperate at both of the first two steps is (||) = which 
is not so favorable. And it exponentially converges to 0. With a static strategy 
repeated over and over, any probability p that the opponent will cooperate in 
one step leads to the probability p" that he will cooperate in n steps, which be- 
comes in the long run. Trust and cooperation require memory and adaptation, 
which can be implemented in position games. 



5 Position and memory 



The positions in a position game Ax X — B x X are recorded in the state 
space X = Hiem -^i- where the projection Xi shows what is visible to the player 
i. In games of perfect information, all of X is visible to all players. Even if the 

Qb 

payoff function A x X — ^ B , the players can use the positions to adapt their 

strategies, and A x X — X should update the position as each move is made. 

For instance, in Iterated Prisoners' Dilemma, each player chooses a sequence 
of moves ai = (,s", sj, . . . , s") and collects at each step the payoff <?(.Sq, ,s^). But 
the moves sj can be chosen adaptively, taking into account the previous £ moves. 
These moves can be recorded as the position. E.g., set X = {{c,d} x {c,d})*, 

and besides the payoff bimatrix {c, d}^ ^ declare the position update 



{c, c?}^ — ^ ({c, d}^)* to be the list function qx{s,x) = s :: x. What are the 
rational strategies now? 

Axelrod reports about the Iterated Prisoners' Dilemma tournaments in [3]. 
E.g., one of the simplest and most successful strategies was tit-for-tat. It uses a 
rudimentary notion of position, recording just the last move: i.e., X = {c,d}^, 

and {c,d}^ {c,d}^ is the identity function. The tit-for-tat strategy X Ai 
is simply to repeat opponent's last move: 

(a;o,a;i) TTo xi {xo,xi) TTi xq 

If both players stick with this strategy, then they will 

— either forever cooperate, or forever defect — if they agree initially, 

— or forever alternate — if they initially disagree. 

Within an n-round iterated game, this is clearly not an equilibrium strategy, 
since each player can win that game by switching any c to d. However, both 
players' total gains in the game will be higher if they cooperate. That is why 
a cooperative strategy may be rational when the cumulative gains within a 
tournament are taken into account, while it may not be a rational way to win a 
single party of the same game, or to assure a higher payoff from a single move. 

The upshot is that a game, viewed in strategic form, may thus lead to three 
completely different games, depending on whether the payoffs are recorded per 
move, or per n rounds against the same opponent, or per tournament against 
many opponents playing different strategies. While the different situations ar- 
guably determine different rationalities, which can be captured by different nor- 
mal forms, the process view of a game, with the various positions through which 
it may evolve, displays not only the semantical relations between the different 
views of the same game, but also a dynamical view of adaptive strategies. 

As a final example, consider a version of Iterated Prisoners' Dilemma, where 
the positions X = R x R record the cumulative gains of both players. The 
cumulative payoff function and the position update function thus happen to be 

identical, {c, d}"^ x — ^ 1^2 . To give the game a sense of the moment, let us 
assume that the gains are subject to a galloping inflation rate of 50% per round, 
i.e. that the cumulative payoffs are given by the bimatrix 



10+ f 

10+ ^ 


11 + ^ 

Xo 

2 


11 + ^ 


1 + ^ 

1 + ^ 
^ ^ 2 



where x = {xq, xi) S is the position, i.e. the previous gains. Suppose that the 
player uses the position-sensitive form of the constructive rationahty 



{s-i,x) CDi 



+ V {^Qi{U-,S-i,x) - Qi{Si,S-i,x)^ ^ 



where £, — gx(ti, S-i, x) is the position reached after the profile {ti, s^i) is played 
at the position x. The response distribution is then 

/ 38+a: x \ 



CD,{x) 



60+2x 2+2x 
22+x 2+x 
60+2x 2+2x , 



Since x changes at each step, the profile CD is not a Markov chain any more. 
Its fixed point is cumbersome to compute explicitly, although it converges fast 
numerically. In any case, it is intuitively clear that the high inflation rate moti- 
vates the players to cooperate. If they do, the cumulative payoff for each of them 
approaches $10 • X^feLo ^ ~ they both defect, their cumulative payoffs 

are $1 • X]fc°=o 1^ ~ they begin to cooperate and accumulate %x each, and 

then one defects, he will acquire an advantage of $11 for that move. But after 10 
further moves, with both players defecting, his advantage will reduce to about 1 
cent, and the cumulative payoff for both players will again boil down to $2. 



6 Conclusions and future work 

We explored the semantical approaches to gaming from three directions: through 
relational programming of strategies in section[3l through quantifying preferences 
in terms of distributions (rather than preorders) in section [H and finally by 
taking into account the positions and the process aspects of gaming, in section 

m 

The advantage of viewing strategies as programs is that they can be refined, 
together with the notion of rationality that they express. To illustrate this point, 
we discussed in section [3] some simple refinements of the standard equilibrium 
concepts. 

The advantage of viewing strategies as randomized programs is that the prob- 
lem of equilibrium selection [5D] can be attacked by the Markov chain methods. 
Mixed strategies are, of course, commonly used in game theory. They assure the 
existence of Nash equilibria. The mixture is interpreted either as a mixed popu- 
lation of players, each playing a single strategy, or as the probability distribution 



with which the single player chooses a single strategy [12j. However, when equi- 
libria are computed as stationary distributions of Markov chains, the mixture 
provides additional information which can be used to coordinate equilibrium se- 
lection. The concrete methods to extract and use this information need to be 
worked out in future research. 

The most interesting feature of the semantical view of games is the dynam- 
ics of gaming, as it evolves from position to position. This feature has only 
been touched upon in the the present paper. On one hand, it leads beyond the 
Markovian realm, and equilibria are harder to compute. But on the other hand, 
in practice, the important rational solutions are often attained through genuinely 
adaptive, position sensitive strategies. The toy example of Prisoners' Dilemma 
already shows that a widely studied science of rationality may miss even the basic 
forms of social rationality because of small technical shortcomings. Combining 
semantics of computation and game theory may help eliminate them. 
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A Appendix: Fixed points in FRel 

For a relation Ax X in the monoidal category (FRel, x, 1), the standard 

fixed point operator (induced by its simple trace structure) gives X A , 



On the other hand, the order structure of FRel induces another fixed point opera- 
tor, where X A is defined as the smallest relation satisfying {id, R*)R = R*, 
i.e. 



For each x, the set xR* — {a \ xR*a} is just the image of the transitive closure 
of A — A ■ It can be defined inductively, as 



defined 






where 



a;i?"+io <S=^ 3c G A. xR'^cA{x,c)Ra 
or in terms of the image xR{C) = {a | 3c S C. {x, c)Ra} and 



oo 



xR* = Pi xi?"(y4) 



n=l 



If the containment order on pA represents information, so the singletons {a} are 
maxima, and A is the minimum, then the above intersection is the least upper 
bound, and R* is the least fixed point. Indeed, the containment R* C R* means 
that in the information order i?* □ i?*. 



B Appendix: Fixed points in SRel 



k can be viewed as an m-tuple of square stochas- 



A stochastic matrix kxm 

tic matrices k k ■ By the Perron-Frobenius theorem, each Hi has 1 as the 
principal eigenvalue. This can also be directly derived from the fact that the 
rows of each Hi — I must be linearly dependent, since the sum of the entries of 
each of its columns is 0. The fixed vectors of each Hi thus lie in its eigenspace of 
1. But this space may be of a high dimension. Which m-tuple of vectors is the 
uniform fixed point of H j8|21l ? 

The uniform fixed points arise from the trace operations |10|4j . Let kxm j 

H' ... 

be formed from the projectors k — k to the principal eigenspaces of k — k ■ 

The uniform fixed point m -4^ k of kxm k can be obtained by tracing 
out k in H = {H*,H*) -.kxm — k x k , defined by 



h[v,w){u,i) — l^v{u,i)^w{u,i) 

The uniform fixed point is thus H' = (^'i)fcxm 'where 



w,u) {w,i) 



V /i* /i* 

Z^w^k w{w,i) u{w,i) 
v.w^k w{w,i) v{w,i) 



To check that this is a fixed point of i.e. that H{H*^I) = H* : m ■ 
note that H — {H* ^ I) : m —{^ k x m is 

10 otherwise] Iq ' otherwise 



k, 



Now HH = H* : m — fc is satisfied iff 



.wGk 



w{w,i) u{w.i) 



.wek 



vGk 



holds for each i G m^u £ k. But this is vahd because ^y^khu{v,i)hl^^^ 



'j{w,i) 



, i.e. H,H* = H* holds by the definition of H*. 



